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Abstract 



Response latency - the time taken to initiate or complete an action or task - is one of the principal 
measures used to investigate the mechanisms subserving human and animal cognitive processes. The right 
tails of response latency distributions have received little attention in experimental psychology. This is 
because such very long latencies have traditionally been considered irrelevant for psychological processes, 
instead, they are expected to reflect 'contingent' neural events unrelated to the experimental question. 
Most current theories predict the right tail of response latency distributions to decrease exponentially 
[T][5]. In consequence, current standard practice recommends discarding very long response latencies as 
'outliers' [3l|4]. Here, I show that the right tails of response latency distributions always follow a power- 
law with a slope of exactly two. This entails that the very late responses cannot be considered outliers. 
Rather they provide crucial information that falsifies most current theories of cognitive processing with 
respect to their exponential tail predictions. This exponent constitutes a fundamental constant of the 
cognitive system that groups behavioral measures with a variety of physical phenomena. 

A pervading assumption in the literature is that Response Latencies (RLs) follow a distribution whose 
right tail decreases exponentially RLs are ultimately by-products of the workings of the brain, and 

further, of the firing patterns of heavily interconnected neural assemblies. From this perspective, exponential 
tails would be a rather surprising outcome for RL distributions [5 . They would imply that the RLs were 
generated by a Poisson process, that is, they would be independent events, despite the interconnections 
between the neurons that generated them. 

More in line with the probably correlated origins of behavioral events, two recent theories have predicted 
that the right tails of RLs distributions should follow a power-law [3 16] . This is to say that for all times t 
greater than a certain tmin, their probability density function should be that of a Pareto distribution: 



where a is referred to as the scaling parameter, and it corresponds to the slope of the straight line that 
is formed by the density function when plotted on log- log scale. A more precise theoretical proposal [B] is 
that RLs arise as the result of the ratio of two correlated normal variables: The excitability of the response 
effector, and the strength of the signal that excited it. Therefore the distribution of RLs should follow a 
normal ratio distribution (NRD j7]). This has the further implication that the power-law right tail should 
have a value of the scaling parameter of exactly two [8", ^ . Such a precise tail behavior would hold irrespective 
of the properties of the task. It would constitute a complete description of the RL distribution in the far 
right tail, in the strong sense of having zero degrees of freedom. The scaling parameter value would therefore 
represent a fundamental constant of the cognitive system. Furthermore, it would group RLs with other 
well-known natural systems with identical properties, such as Ising models of ferromagnetic materials close 
to their critical temperature 8 or the intervening times between major earthquakes |10j . 

Obtaining estimates of the distributions in the far right tail requires very large numbers of ideally un- 
truncated responses. I analyzed six large-scale databases of human responses across experimental tasks and 
modalities, and at different time ranges. The datasets included ocular fixation and blink durations during 
reading {The Dundee Corpus [H]), spontaneous ocular fixation durations while participants were inspecting 
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photographs {DOVES database [121), ^^'^ ^ sample of different web-collected experiments extracted from the 
PsychExperiments [13' web site. This last set included two-choice decision reaction times to both auditory 
(tones) and visual (colours) stimuli, reaction times of participants performing a mental rotation task, and 
the times that participants took to exit a virtual maze. 

The solid dots in the left panel of Fig. 1 plot (in log-log scale) the histograms of the RLs in each of 
the datasets, aggregated across participants. Notice that all six distributions show a very similar pattern: 
The probabilities of the faster latencies rise to a peak, from which they decrease, gradually approaching a 
straight line, which is the characteristic signature of a power-law distribution. As predicted, the straight 
line components seem remarkably parallel across the six datasets, with a slope of approximately minus 
two (black dot-dashed lines). The right panel in Fig. 1 further stresses this apparent invariance. It plots 
the corresponding distributions when the times have been divided by their medians so as to remove the 
scale-dependent component of the distributions. One can distinguish three phases in the distributions. 
The early times rise to a peak, following very different patterns for each dataset. From the mode up to 
somewhere between five and forty times the median, there is a transition phase where the distributions 
gradually approach a power-law. The precise speed of convergence to the power-law varies depending on 
the properties of the participant and the task [5J |S]. From this point onwards - as stressed by the inset 
panel in the figure - the distribution of latencies is approximately the same, regardless of the particular 
experimental task. To confirm that this pattern holds when one considers only single-participant data, the 
figures also plot the histogram of the responses of an individual participant in the Dundee dataset (open red 
circles; these correspond to participant "sd", but the pattern also holds for all other participants). Finally, 
in order to illustrate the theoretical prediction across the whole range of latencies, the figure also includes 
the theoretical density that would be predicted by an instance of the NRD with arbitrary parameters (black 
solid lines), and how the histogram from a sample of such would look like (grey open circles). 

The histograms in Fig. 1 seem consistent with the hypothesis that the right tails of latency distributions 
follow a power-law with a scaling parameter of two, and most certainly discard the traditional assumption of 
a light, exponential-type tail. However, other heavy-tailed distributions could also produce histograms with 
this appearance, and this has given rise to disagreements with respect to the precise nature of heavy tails 
in some datasets. Therefore, the hypothesis needs to be contrasted with other possible distributions with 
similarly heavy tails. Both log-normal and stretched-exponential {i.e., WeibuU) tailed distributions also give 
rise to very heavy tails [HI [151 [16], and both have been proposed as plausible theoretical or empirical models 
for RL distributions [171 [3] • In addition, as I predict that the power-law should have a scaling parameter of 
exactly two, any other power-law with an arbitrary scaling parameter - not necessarily, but also including 
two - could be an alternative description 0|. 

Tab. 1 summarises the posterior evidence supporting the hypothesis that the right tails follow a power 
law distribution with a scaling parameter of exactly two over each of the other three candidate hypotheses 
|18j . For four out of the six aggregated datasets, and for the individual participant analysis, the evidence 
supports the hypothesis over the three competing candidates {i.e., positive values in the table). In the 
remaining two cases (negative values, highlighted in bold), the best candidate distribution was a power-law 
with an arbitrary value of the scaling parameter. In both of these cases, it seems like the optimal value of 
the scaling parameter estimated under a general power-law hypothesis has a value above two (last row in 
the table). 

The model comparison method was particularly stringent on the target hypothesis. The implicit trun- 
cation (see Supplementary Materials) present in the data could lead to the over-estimation of the scaling 
parameter that was found for two of the datasets. To investigate this possibihty, I fitted an NRD to the 
RLs in the Maze dataset, as this was the one for which the theory showed the worst performance. From the 
fitted distribution, I generated a sample of artificial RLs of the same size as the Maze dataset, sampling only 
points below 50 times the median (this is equivalent to an upper truncation at around eight minutes). The 
sample was discretised to simulate a measurement resolution of one ms. Fig. 2 compares the original data 
with the sample and the fitted distribution. Although these simulated data originated from a power-law 
tailed distribution with a true scaling parameter of exactly two, applying the hypothesis testing procedure 
revealed a very similar pattern to what was observed in the Maze data (see the last column of Tab. 1). 
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All three alternative hypotheses seemed more probable than the target (and correct) hypothesis due to the 
advantage that truncation gives them. Given the quality of the fit in Fig. 2, it seems likely that the same 
distortion took place in the Maze data. In sum, all datasets were consistent with the theoretical prediction. 
The theory was the best of the four candidate theories for the majority of the datasets studied, all of which 
showed power-law tails. One cannot fully discard that the power-law tail may be subject to a cutoff at some 
unknown high value. However, lacking a specific value for the location of the cutoff is equivalent to stating 
that the power-law regime continues indefinitely, which is also the most parsimonious assumption by the 
Principle of Maximum Entropy |19] . 

Power-laws are often interpreted as evidence for Self-Organizing Criticality (SOC [20 ), but several other 
mechanisms could also give rise to power-laws without the explicit need for self-organization [211 [S]- In the 
domain of human RLs, some authors have argued for the presence of SOC using evidence for 1// 'pink' noise 
in the frequency spectra of RLs [351 1131 [IS] , but this evidence is currently subject to discussion [3H1 H?) . 
The fixed scaling parameter of two is common to a prototypical model of a system that is known to be in a 
critical state: The Ising model of a magnet [8]. This model is also described by an NRD, originating from 
the fractional change in magnetization (Am/m). At a small environment around its critical temperature, 
the Ising model exhibits power-law behavior, but very small deviations from the critical temperature restrict 
the power-law to the very far tails [Hj. This is very much what I have observed in the RLs. The power-law 
behavior settles at the extreme right tails, between five and forty times the median RL in a particular task. 
Rather than evidence for SOC, the results in fact argue for a system that has been pushed slightly away 
from its critical point. This suggests that, at rest, the system is likely to be in a state which could be 
characterised as SOC, but the presentation of stimuli disturbs this criticality. This picture is consistent with 
recent work on electro-physiology. Human (and animal) neural oscillations are generally at a critical state, 
characterised by both power-laws and 1// noise patterns, but transient synchronization of neural assemblies 
during cognitive processing can temporarily disturb this criticality |28j . The power-law with exponent two 
provides a characterisation of this critical state. Measuring the magnitudes of deviations from this resting 
state elicited by different conditions can provide a direct measure of the amounts of information processing 
they involve, considered here as a relaxation in return to the critical state. 

It comes as no surprise that human behavior, given its neural origins, should be best described by a 
complex system. It has recently been suggested that scale-invariance, as expressed by power-laws, may 
constitute a "universal principle" governing human cognition [29], and biological systems in general [30] . a 
view that is supported by this study. 

Methods 

For an objective rationale to choose among the four possible explanations for the heavy tails, I used pairwise Bayes factors 
|18| between the log-likelihoods for each candidate hypothesis. The distribution proposed here has no free parameters, thus the 
computation of the log-likelihood for a fixed value of tmin is straightforward (see additional materials for details on the selection 
of tmin)- However, the other three candidate hypotheses have either one (for the general power-law case) or two free parameters 
(the log- normal and WeibuU tail cases). For these, the log- likelihood was computed by numerical integration on the parameter 
space, assuming truncated uninformative {i.e., Jeffreys') priors for the free parameters. The truncation was designed to favour 
the three alternative hypotheses over the power-law proposed here. The integration space was restricted to plausible values of 
the parameters: For the general power-law hypothesis, I assumed that the scaling parameter should take a value greater than 
one and smaller than six, as power-laws with scaling paramaters greater than this arc seldom observed in natural phenomena 
|16l 1151 121| . For the WeibuU hypothesis, I assumed that the value of its shape parameter shoud never be above one, as this 
would imply an exponential tail or lighter, which cannot correspond to the pattern observed in the histograms. Finally, both 
the WeibuU location and the log-normal location and scale parameters were restricted to values that would make the datapoints 
correspond to an actual right tail {i.e., their mode should fall to the left of tmin) and have a peak within the range of RLs. 
Note that these constraints actually increase the likelihood of the alternative hypotheses beyond the under-specification than 
is found in the literature, thus providing conservative estimates of the evidence in support of our hypothesis. Further technical 
details are provided in the Supplementary Materials. 
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Table 1: Posterior evidence in favour of a power-law with exponent two over each of the alternative 

hypotheses for the datasets analyzed, as well as for the artificially generated data simulating the Maze RLs. 
Positive values indicate support for the power-law with a = 2, while negative values indicate evidence in 
favour of each of the alternative hypotheses. The first two rows indicate the value of tmin in relation to the 
corresponding median, and the number of points above this threshold found in each dataset. The last row 
indicates the posterior estimate of a if one assumed the general (unrestricted exponent) power-law hypothesis 
to be true. 
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(whole set) 


DOVES 
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Rotation 


Maze 


Dundee 
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Data 


imin /median (t) 
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10 
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5 


40 


10 


5 


10 


Niimber of t > imin 


33 


57 


133 


544 


31 
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27 


366 


Log-Normal 


5.5 dB 


2 dB 


-6.5 dB 


13.5 dB 


13 dB 


-257.5 dB 


9 dB 


-15 dB 


WeibuU 


2 dB 


2 dB 


31 dB 


43 dB 


26 dB 


-241.5 dB 


11.5 dB 


-7 dB 


Power-Law (general) 


2 dB 


3 dB 


-22.5 dB 


9.5 dB 


5.5 dB 


-261 dB 


5.5 dB 


-21 dB 


d 


2.56 


1.82 


2.50 


1.92 


2.27 


2.99 


2.17 


2.48 
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Normalized Time (number of medians) 



Figure 1: Histograms of the latencies in second in the six datasets plotted in log-log scale. The solid dots 
represent the six datasets aggregated across participants. The open red circles plot the histogram from a 
single participant from the Dundee dataset. The open grey circles plot a sample from an arbitrary instance 
of the NRD, whose density corresponds to the solid black lines. The dot-dashed black lines illustrate a slope 
of -2. Both panels represent the same data either on the true time scale (left panel), or in the time scale 
normalised by the corresponding median (right panel) . The inset on the right panel magnifies the power-law 
right tail of the distributions. 
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Time (seconds) 



Figure 2: Histograms (in log-log scale) of the RLs in the Maze dataset (black solid dots) and of the simulated 
artificial dataset (grey open circles). The solid grey line plots the Maximum Likelihood NRD fit to the Maze 
dataset from which the simulated points were sampled. 
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Supplementary Materials 



Supplementeiry Methods 

Materials The data from PsychExperiments correspond in their greater part to latencies of psychology 
students performing the experiments (hiring in-class sessions, but some residual c;omponent of latencies 
originating from persons 'trying out' the system is also present. In order to minimise in as a much as possible 
the distortion of the distributions introduced by users testing the system, only RLs from participants whose 
number of responses in the database corresponds exactly the number of stimuli in the experiment, were kept. 
In addition, all RLs with values equal or smaller than zero, or for which some field of the downloaded file was 
contained an incosistent value {e.g., an incorrect task descriptor, etc.) were removed prior to the analyses. 
Descriptive statistics of the resulting datasets are provided in Supl. Tab. 1. 

The experimental RLs were measured to a precision of 1 msec, except for the ocular RLs from DOVES, 
which had a precision of 5 msec, {i.e., the eye-tracking equipment used a sampling rate of 200 Hz.). A similar 
discretization was perfomed on the artificially generated datasets, simulating an experimental resolution of 
1 msec, relative to a median of 9,278 msec, estimated by maximum likelihood. 



Histograms To obtain a smoother estimate of the far-right tails of the histograms I used logarithmically 
increasing bin widths for the histograms [21]. Note that this technique over-estimates the densities in the 
lower time ranges when the data have been discretised by the measurement apparatus. 

Estimation of the evidence For any two candidate models Mi and M2, if t — {ti, . . . ,tj\[) are the 
latencies in the dataset that are longer than tmin, the evidence in favor of Mi over M2 is: 

. io,„,„g|| .:o,„,„?«) ..o,„,.?ffll), (s:, 

where the second step is achieved by application of Bayes' Theorem, and p{t\Mi) is the likelihood of the 
datapoints for model Mi . If a-priori we consider both models equally probable, the evidence reduces to the 
difference in log-likelihoods: 

E{Mi,M2\t) ^ 10 [logiop(t|Mi) - logiop(t|M2)] - 10 [£{t\Mi) - e{t\M2)] , (S2) 

I used decimal logarithms and the scaling factor of 10 so that the resulting evidence would be measured in 
decibels [31]. 



Log-likelihood for the target distribution For the power-law with a pre-determined scaling parameter 
a = 2, the computation of £ is straightforward: 



N N 



i{t\a = 2) = logio ]Jp(ii|imin,Q: = 2) = ^logioP(ii|imin,Q; = 2), (S3) 



i=l i=l 



where p{ti\tmin, a = 2) is the density function of a discrete power-law (normalised for an upper truncation 
at max{t}). 



Log-likelihood for the alternative hypotheses For an hypothesis M with free parameters = {Ox, ... ,0k), 
the marginal log-likelihood of the hypothesis is given by: 

£ (t|M) = logio / p(t, e\M)de = logio / p{t\e, M)p{e\M)de, (S4) 

Jv{9) Jv(e) 
where V{d) is the volume defined by the parameters, p{d\M) is the prior on 6 and: 

N 

p{t\e,M)^\{p{u\e,M), (S5) 



SI 



where p(ti\6,M) is given by the density function of M. 

For each hypothesis, I estimated the integral in Eq. S4 numerically using a Montecarlo technique. I 
sampled 10^ points from the prior distribution p{6\M), computed the likelihood of t using the sampled 
values of 0, and took the mean result as the marginal likelihood. For each of the three distributions, I used 
discrete versions of their densities^^ truncated between imin and max{t}. 

In order to minimise numerical errors, all the computations above were performed in logarithmic scale. 
Evidence values were rounded to half-decibels, as this is argued to be minimal perceptible difference in 
evidence [31]. 

Selection of t^un The threshold value ^min was chosen by visual inspection of the histograms in Fig. 1. In 
addition, these choices were validated by assessing whether the chosen value would be close to the one that 
would minimise the value of the Kolmogorov-Smirnoff statistic between the sample of RLs and a general 
power- law [16], which in all cases suggested similar values. Note however, that this objective method can be 
problematic when one considers the additional upper truncation that is present in our data. 



Supplementary Table 1: Description and summary statistics of the datasets used in the analyses. 



Task 


Reading 


Visual 
inspection 


Two-choice RT 


Two-choice RT 


Mental 
rotation 


Maze 


Reading 
(single 
subject) 


Artificial 
(truncated) 


Stimulus 


visual 


visual 


auditory 


visual 


visual 


visual 


visual 




dynamic 


static 


static 


static 


static 


dynamic 


dynamic 




Response 


ocular 


ocular 


manual 


manual 


manual 


manual 


ocular 




simple 


simple 


simple 


simple 


simple 


complex 


simple 




N. responses 


420,809 


35,500 


17,800 


107,360 


589,440 


63,090 


50,731 


63,090 


N. participants 


10 


29 


890 


5,368 


12,280 


4,206 


1 




Resp./Part. 


42,080.90 


1,224.13 


20.00 


20.00 


48.00 


15.00 


50,731 




Minimum 


0.012 s 


0.100 s 


0.010 s 


0.030 s 


0.021 s 


2.007 s 


0.012 s 


2.-528 s 


Maximum 


8.901 s 


5.200 s 


108.657 s 


172.211 s 


3,561.324 s 


1,331.391 s 


8.901 s 


460.460 s 


Mean t 


0.195 s 


0.372 s 


0.590 s 


0.493 s 


3.478 s 


13.101 s 


.195 s 


12.483 


Median t 


0.188 s 


0.295 s 


0.661 s 


0.416 s 


2.692 s 


9.019 s 


.188 s 


9.273 s 



Supplementeiry Notes 

• Precision of on-line RL meeisurements The datasets from PsychExperiments were collected online 
using authorware programs. In general, the precision of these measurements is approximately equiva- 
lent to the precisions obtained in traditional laboratory testing sessions 1 ms.), except for cases in 
which the client computer has an exceptionally large number of concurrent processes running [13,32]. 

• Implicit truncation An additional problem that affects the comparison between the different theories 
is the right-truncation that is present in the data. This introduces a bias in favor of lighter tails: It 
favors either of the non-power-law distributions, or the general power-law with a high value of the 
scaling parameter. In both of the eye movement datasets, fixations are determined by computer 
algorithms that rely on aspects such as their duration, therefore implicitly introducing both left and 
right truncations. In the internet collected datasets, there is officially no right truncation in the data. 
However, considering that the great majority of the data were collected during in-class training sessions, 
most responses will be subject to an implicit upper bound dictated by the duration of the class. The 
residual of responses longer than one class session are very likely to originate in users testing the system, 
system failures in the client, etc.. To attenuate this problem, I assumed that all datasets had been 
right-truncated at the maximum RL observed. For the web collected datasets this is still an under- 
estimation of the real truncation point, leaving some advantage for the non-power-law distributions, 
and over-estimating the scaling parameter for power-laws. This problem is more acute in the longer 
RL datasets, where the truncation point comes closer to the magnitudes of valid observed RLs. 
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• Supplementary references 

31. E. T. Jaynes, Probability Theory: The Logic of Science (Cambridge University Press, Cambridge, 
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